Policy Search in Continuous Action Domains: an Overview

نویسندگان

Olivier Sigaud

Freek Stulp

چکیده

Continuous action policy search, the search for efficient policies in continuous control tasks, is currently the focus of intensive research driven both by the recent success of deep reinforcement learning algorithms and by the emergence of competitors based on evolutionary algorithms. In this paper, we present a broad survey of policy search methods, incorporating into a common big picture these very different approaches as well as alternatives such as Bayesian Optimization and directed exploration methods. The main message of this overview is in the relationship between the families of methods, but we also outline some factors underlying sample efficiency properties of the various approaches. Besides, to keep this survey as short and didactic as possible, we do not go into the details of mathematical derivations of the elementary algorithms.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Model - based Direct Policy Search ( Extended Abstract ) Jan

Scaling Reinforcement Learning (RL) to real-world problems with continuous state and action spaces remains a challenge. This is partly due to the reason that the optimal value function can become quite complex in continuous domains. In this paper, we propose to avoid learning the optimal value function at all but to use direct policy search methods in combination with model-based RL instead.

متن کامل

Big Tobacco, Alcohol, and Food and NCDs in LMICs: An Inconvenient Truth and Call to Action; Comment on “Addressing NCDs: Challenges From Industry Market Promotion and Interferences”

In their editorial, Tangcharoensathien et al1 describe the challenges of industry market promotion and policy interference from Big Tobacco, Alcohol, and Food in addressing non-communicable diseases (NCDs). They provide an overview of the increasing influence of corporate interest in emerging eco...

متن کامل

Guided exploration in gradient based policy search with Gaussian processes

Applying reinforcement learning(RL) algorithms in robotic control proves to be challenging even in simple settings with a small number of states and actions. Value function based RL algorithms require the discretization of the state and action space, a limitation that is not acceptable in robotic control. The necessity to be able to deal with continuous state-action spaces led to the use of dif...

متن کامل

Planning with Continuous Resources in Stochastic Domains

Past work has dealt with various variants of this problem. We consider the problem of optimal planning in stochastic domains with metric resource constraints. Our goal is to generate a policy whose expected sum of rewards is maximized for a given initial state. We consider a general formulation motivated by our application domain planetary exploration in which the choice of an action at each st...

متن کامل

Mean Actor Critic

We propose a new algorithm, Mean Actor-Critic (MAC), for discrete-action continuous-state reinforcement learning. MAC is a policy gradient algorithm that uses the agent’s explicit representation of all action values to estimate the gradient of the policy, rather than using only the actions that were actually executed. This significantly reduces variance in the gradient updates and removes the n...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2018

Policy Search in Continuous Action Domains: an Overview

نویسندگان

چکیده

منابع مشابه

Model - based Direct Policy Search ( Extended Abstract ) Jan

Big Tobacco, Alcohol, and Food and NCDs in LMICs: An Inconvenient Truth and Call to Action; Comment on “Addressing NCDs: Challenges From Industry Market Promotion and Interferences”

Guided exploration in gradient based policy search with Gaussian processes

Planning with Continuous Resources in Stochastic Domains

Mean Actor Critic

عنوان ژورنال:

اشتراک گذاری